We choose to simulate simple 3 traits networks, in two scenarios as below. For each, we are looking for the best learning conditions, and the worst for each network learning method.
For evaluation, we:
For the simulations reported below, the kinship matrix used is for plants. It is possible that mouse or human relations in a typical GWAS be different, so the effect of these need to be invistigated too.
dagcoeffdscout.2gtrait.identicalvar <- dscquery(dsc.outdir= "~/github_repos/pcgen2/progress/debug/",
targets = c("simulate.dagcoeff", "simulate.rangeGenVar",
"simulate.rangeEnvVar", "simulate.true_dag",
"simulate.data", "simulate.cor_traits",
"learn", "learn.learnt_dag", "learn.cor_residuals",
"learn.dagfit_tpr", "learn.dagfit_fpr",
"learn.dagfit_tdr", "learn.dagfit_shd",
"learn.genfit_tpr", "learn.genfit_fpr",
"learn.genfit_tdr" )) %>% as_tibble()
dscout.2gtrait.identicalvar %>% filter(learn == "pcgen") %>%
select(simulate.cor_traits, DSC, learn.cor_residuals, simulate.dagcoeff, simulate.rangeGenVar, simulate.rangeEnvVar) %>%
mutate(dagcoeff = factor(simulate.dagcoeff),
cor_residuals = learn.cor_residuals %>% map_dbl(3),
cor_traits = simulate.cor_traits,
replicate = factor(DSC)) %>%
ggplot(aes(x = cor_traits, y = cor_residuals, group = replicate)) +
geom_point(aes(color = dagcoeff)) +
geom_abline(aes(slope =1, intercept = 0)) +
# geom_smooth(se = FALSE) +
theme_classic() + facet_grid(rows = vars(simulate.rangeEnvVar), cols = vars(simulate.rangeGenVar))
In the figure above, rangeGenVar varies across columns, and rangeEnvVar varies across rows. The plots on the left bottom corner (low rangeGenVar (<0.1), and rather high rangeEvnVar (>0.5)) suggest that the traits and residuals could possibly be used interchangeably for learning a network- i.e, one would expect both pcRes and vaniall pc to work similarly with these values (regardless of the strength of correlation between traits). For other combinations, only when dagcoeff is large (>=2 in these figures) do we get similar patterns from both traits and residuals.
Now, let’s see what this means for each of the 3 learning methods: pcgen, pcRes and pc.
plot.performance(dscout.2gtrait.identicalvar, "vanilla")
plot.performance(dscout.2gtrait.identicalvar, "pcres")
plot.performance(dscout.2gtrait.identicalvar, "pcgen")
These plots make me think:
pcgen. With the other 2, there is not really big differencepcres and pcgen, and are comparable to vanilla. At larger dagcoeff, the methods are ideal.pcres and pcgen. For a sufficiently large dagcoeff, they both outperform vanilla. However, if it is low (eg 0.1) they both do worstpcgen learns this patternrangeGenVar, large rangeEnvVar and any dagcoeffi.e, for each trait, the variances are chosen at random: \(\in \{0, var\}\).
dscout.2gtrait.diffvar <- dscquery(dsc.outdir= "~/github_repos/pcgen2/progress/simulation.2gtrait.diffvar/",
targets = c("simulate.dagcoeff", "simulate.rangeGenVar",
"simulate.rangeEnvVar", "simulate.true_dag",
"simulate.data", "simulate.cor_traits",
"learn", "learn.learnt_dag", "learn.cor_residuals",
"learn.dagfit_tpr", "learn.dagfit_fpr",
"learn.dagfit_tdr", "learn.dagfit_shd",
"learn.genfit_tpr", "learn.genfit_fpr",
"learn.genfit_tdr" )) %>% as_tibble()
Similarly, we check the correlations between traits and residuals- and we see similarity with the trends we obtained when the variance was the same for the traits.
dscout.2gtrait.diffvar %>% filter(learn == "pcgen") %>%
select(simulate.cor_traits, DSC, learn.cor_residuals, simulate.dagcoeff, simulate.rangeGenVar, simulate.rangeEnvVar) %>%
mutate(dagcoeff = factor(simulate.dagcoeff),
cor_residuals = learn.cor_residuals %>% map_dbl(3),
cor_traits = simulate.cor_traits,
replicate = factor(DSC)) %>%
ggplot(aes(x = cor_traits, y = cor_residuals, group = replicate)) +
geom_point(aes(color = dagcoeff)) +
geom_abline(aes(slope =1, intercept = 0)) +
# geom_smooth(se = FALSE) +
theme_classic() + facet_grid(rows = vars(simulate.rangeEnvVar), cols = vars(simulate.rangeGenVar))
And in terms of performance, the trend is also similar:
plot.performance(dscout.2gtrait.diffvar, "vanilla")
plot.performance(dscout.2gtrait.diffvar, "pcres")
plot.performance(dscout.2gtrait.diffvar, "pcgen")
dagcoeffdscout.1gtrait.identicalvar <- dscquery(dsc.outdir =
"~/github_repos/pcgen2/progress/simulation.1gtrait.identicalvar/",
targets = c("simulate.dagcoeff", "simulate.rangeGenVar",
"simulate.rangeEnvVar", "simulate.true_dag",
"simulate.data", "simulate.cor_traits",
"learn", "learn.learnt_dag", "learn.cor_residuals",
"learn.dagfit_tpr", "learn.dagfit_fpr",
"learn.dagfit_tdr", "learn.dagfit_shd",
"learn.genfit_tpr", "learn.genfit_fpr",
"learn.genfit_tdr" )) %>% as_tibble()
In terms of correlations:
dscout.1gtrait.identicalvar %>% filter(learn == "pcgen") %>%
select(simulate.cor_traits, DSC, learn.cor_residuals, simulate.dagcoeff, simulate.rangeGenVar, simulate.rangeEnvVar) %>%
mutate(dagcoeff = factor(simulate.dagcoeff),
cor_residuals = learn.cor_residuals %>% map_dbl(3),
cor_traits = simulate.cor_traits%>% map_dbl(3), #we saved this as a matrix
replicate = factor(DSC)) %>%
ggplot(aes(x = cor_traits, y = cor_residuals, group = replicate)) +
geom_point(aes(color = dagcoeff)) +
geom_abline(aes(slope =1, intercept = 0)) +
# geom_smooth(se = FALSE) +
theme_classic() + facet_grid(rows = vars(simulate.rangeEnvVar), cols = vars(simulate.rangeGenVar))
And the performance:
plot.performance(dscout.1gtrait.identicalvar, "vanilla")
plot.performance(dscout.1gtrait.identicalvar, "pcres")
plot.performance(dscout.1gtrait.identicalvar, "pcgen")
So, one may say that compared with the first scenario, pcgen and pcRes do a better job of learning for a network of this nature.
dscout.1gtrait.diffvar <- dscquery(dsc.outdir= "~/github_repos/pcgen2/progress/simulation.1gtrait.diffvar/",
targets = c("simulate.dagcoeff", "simulate.rangeGenVar",
"simulate.rangeEnvVar", "simulate.true_dag",
"simulate.data", "simulate.cor_traits",
"learn", "learn.learnt_dag", "learn.cor_residuals",
"learn.dagfit_tpr", "learn.dagfit_fpr",
"learn.dagfit_tdr", "learn.dagfit_shd",
"learn.genfit_tpr", "learn.genfit_fpr",
"learn.genfit_tdr" )) %>% as_tibble()
And similarly, in terms of correlations:
dscout.1gtrait.diffvar%>% filter(learn == "pcgen") %>%
select(simulate.cor_traits, DSC, learn.cor_residuals, simulate.dagcoeff, simulate.rangeGenVar, simulate.rangeEnvVar) %>%
mutate(dagcoeff = factor(simulate.dagcoeff),
cor_residuals = learn.cor_residuals %>% map_dbl(3),
cor_traits = simulate.cor_traits %>% map_dbl(3),
replicate = factor(DSC)) %>%
ggplot(aes(x = cor_traits, y = cor_residuals, group = replicate)) +
geom_point(aes(color = dagcoeff)) +
geom_abline(aes(slope =1, intercept = 0)) +
# geom_smooth(se = FALSE) +
theme_classic() + facet_grid(rows = vars(simulate.rangeEnvVar), cols = vars(simulate.rangeGenVar))
And the performance:
plot.performance(dscout.1gtrait.diffvar, "vanilla")
plot.performance(dscout.1gtrait.diffvar, "pcres")
plot.performance(dscout.1gtrait.diffvar, "pcgen")